Challenges in Managing Implicit and Abstract Provenance Data: Experiences with ProvManager
نویسندگان
چکیده
Running scientific workflows in distributed and heterogeneous environments has been motivating the definition of provenance gathering approaches that are loosely coupled to workflow management systems. We have developed a provenance management system named ProvManager to manage provenance data in distributed and heterogeneous environments independent of a specific Scientific Workflow Management System. The experience of using ProvManager in real workflow applications has shown many provenance management issues that are not addressed in current related work. We have faced challenges such as the necessity of dealing with implicit provenance data and the lack of higher provenance abstraction levels. This paper discusses and points to directions towards these challenges, contextualizing them according to our experience in developing ProvManager.
منابع مشابه
Managing Provenance in Scientific Workflows with ProvManager
Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow systems. We have proposed a provenance gathering strategy that is independent from workflow system technology. This strategy has evolved into a provenance management system named ProvManager. The main principle is that each workflow ac...
متن کاملIntegrating Provenance Data from Distributed Workflow Systems with ProvManager
Running scientific workflows in distributed environments is motivating the definition of provenance gathering approaches that are loosely coupled to the workflow execution engine. This kind of approach is interesting because it allows both storage and access to provenance data in an integrated way, even in an environment where different workflow management systems work together. Therefore, we h...
متن کاملIssues in Building Practical Provenance Systems
The importance of maintaining provenance has been widely recognized, particularly with respect to highly-manipulated data. However, there are few deployed databases that provide provenance information with their data. We have constructed a database of protein interactions (MiMI), which is heavily used by biomedical scientists, by manipulating and integrating data from several popular biological...
متن کاملOn the Use of Semantic Abstract Workflows Rooted on Provenance Concepts
Two challenges related to capturing provenance about scientific data are: 1) determining an adequate level of granularity to encode provenance, and 2) encoding provenance in a way that facilitates enduser interpretation and analysis. A solution to address these challenges consists in integrating two technologies: Semantic Abstract Workflows (SAWs), which are used to capture a domain expert’s un...
متن کاملScaling SPADE to "Big Provenance"
Provenance middleware (such as SPADE) lets individuals and applications use a common framework for reporting, storing, and querying records that characterize the history of computational processes and resulting data artifacts. Previous efforts have addressed a range of issues, from instrumentation techniques to applications in the domains of scientific reproducibility and data security. Here we...
متن کامل